fix: singleton WorkflowService + mark task failed on load error#153
Open
devteamaegis wants to merge 1 commit into
Open
Conversation
Two logic bugs in the workflow backend: 1. get_service() created a *new* WorkflowService instance on every HTTP request. Because active_tasks / workflow_tasks / cancel_events are stored on the instance, any follow-up request (status poll, cancel, log fetch) hit a fresh empty service and got 404 / "Task not found". Fixed by promoting the instance to a module-level singleton. 2. When Workflow.load_from_file raised an exception the handler called print() and returned without updating active_tasks[task_id].status, leaving the task permanently stuck as "running". The caller could never know the workflow failed to start. Fixed by setting status to "failed", populating the error field, and routing the message through the structured log instead of stdout.
Contributor
There was a problem hiding this comment.
1 issue found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="workflows/backend/tests/test_service_singleton_and_load_failure.py">
<violation number="1" location="workflows/backend/tests/test_service_singleton_and_load_failure.py:64">
P2: Module-level mocking of `aiofiles` and `yaml` leaks stubbed modules into the shared test process and can break unrelated tests.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
| @@ -0,0 +1,222 @@ | |||
| """ | |||
Contributor
There was a problem hiding this comment.
P2: Module-level mocking of aiofiles and yaml leaks stubbed modules into the shared test process and can break unrelated tests.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At workflows/backend/tests/test_service_singleton_and_load_failure.py, line 64:
<comment>Module-level mocking of `aiofiles` and `yaml` leaks stubbed modules into the shared test process and can break unrelated tests.</comment>
<file context>
@@ -0,0 +1,222 @@
+_aiofiles_open_cm.__aenter__ = AsyncMock(return_value=_aiofiles_file)
+_aiofiles_open_cm.__aexit__ = AsyncMock(return_value=False)
+_aiofiles_mock.open = MagicMock(return_value=_aiofiles_open_cm)
+sys.modules["aiofiles"] = _aiofiles_mock
+
+sys.modules["yaml"].safe_load = MagicMock(return_value={})
</file context>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two independent logic bugs in
workflows/backend/:Bug 1 —
get_service()creates a new instance on every requestactive_tasks,workflow_tasks, andcancel_eventsare stored on the instance.A
POST /executerequest builds service A and stores the task there.Any follow-up
GET /tasks/{id}/statusorPOST /tasks/{id}/cancelbuilds service B (empty) and always returns 404 / "Task not found" — the task status and cancel endpoints are effectively broken.Fix: promote to a module-level singleton.
Bug 2 — load failure leaves task stuck as
'running'If
Workflow.load_from_fileraises (corrupt file, missing deps, schema error), the task is never marked'failed'. Callers poll forever seeingstatus: runningwith no error details. The message also went to stdout instead of the structured log.Fix: set
status = 'failed', populateerror, write to the log file.Tests
workflows/backend/tests/test_service_singleton_and_load_failure.py— 6 unit tests, no browser/LLM required:test_returns_workflow_service_instanceget_service()returns aWorkflowServicetest_same_instance_on_repeated_callstest_task_state_survives_across_callstest_singleton_reset_creates_new_instancetest_load_failure_sets_status_to_failed'failed'after load raisestest_load_failure_logs_errorSummary by cubic
Fixes two backend bugs: make
WorkflowServicea module-level singleton and mark tasks as failed (with logged errors) when workflow loading fails. This restores persistent task state across requests and prevents tasks from getting stuck as running.get_service()now returns a sharedWorkflowServiceinstance, soactive_tasks,workflow_tasks, andcancel_eventspersist and status/cancel/log endpoints find tasks instead of 404.Workflow.load_from_filethrows, set task status to'failed', populateerror, and write the message to the task log (no stdout), so clients stop polling and see the failure reason.Written for commit 10fc4e1. Summary will update on new commits. Review in cubic